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Loopy belief propagation performs approximate inference on graphical models 
with loops. One might hope to compensate for the approximation by adjusting 
model parameters. Learning algorithms for this purpose have been explored pre- 
viously, and the claim has been made that every set of locally consistent marginals 
can arise from belief propagation run on a graphical model. On the contrary, here 
we show that many probability distributions have marginals that cannot be reached 
by belief propagation using any set of model parameters or any learning algorithm. 
We call such marginals 'unbelievable.' This problem occurs whenever the Hessian 
of the Bethe free energy is not positive-definite at the target marginals. All learn- 
ing algorithms for belief propagation necessarily fail in these cases, producing 
beliefs or sets of beliefs that may even be worse than the pre-learning approxima- 
tion. We then show that averaging inaccurate beliefs, each obtained from belief 
propagation using model parameters perturbed about some learned mean values, 
can achieve the unbelievable marginals. 

1 Introduction 

Calculating marginal probabilities for a graphical model generally requires summing over exponen- 
tially many states, and is NP-hard in general [ 1 1. A variety of approximate methods have been used 
to circumvent this problem. One popular technique is belief propagation (BP), in particular the sum- 
product rule, which is a message-passing algorithm for performing inference on a graphical model 
0. Though exact and efficient on trees, it is merely an approximation when applied to graphical 
models with loops. 

A natural question is whether one can compensate for the shortcomings of the approximation by 
setting the model parameters appropriately. In this paper, we prove that some sets of marginals 
simply cannot be achieved by belief propagation. For these cases we provide a new algorithm that 
can achieve much better results by using an ensemble of parameters rather than a single instance. 

We are given a set of variables x with a given probability distribution P{x) of some data. We would 
like to construct a model that reproduces certain of its marginal probabilities, in particular those over 
individual variables Pi(x.i) = J2 x \x- P( x ) f° r n °des i E V, and those over some relevant clusters 
of variables, p a (x a ) — J2 x \ Xa P( x ) f° r a — •••>*<!„}■ We will write the collection of all 
these marginals as a vector p. 
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We assume a model distribution Qq(x) in the exponential family taking the form 

Qo(x)=e- E W/Z (1) 
with normalization constant Z = ^2 x e^ E ^ and energy function 

E(x) = -^T0 Q -0 Q (:r Q ) (2) 

ex. 

Here, a indexes sets of interacting variables (factors in the factor graph 01), and x q: is a sub- 
set of variables whose interaction is characterized by a vector of sufficient statistics 4> a (x a ) and 
corresponding natural parameters a . We assume without loss of generality that each 4> a (x a ) is 
irreducible, meaning that it cannot be written as a sum of any linearly independent functions that 
themselves do not depend on any Xi for i £ a. We collect all these sufficient statistics and natural 
parameters in the vectors <j> and 9. 

Normally when learning a graphical model, one would fit its parameters so the marginal probabilities 
match the target. Here, however, we will not use exact inference to compute the marginals. Instead 
we will use approximate inference via loopy belief propagation to match the target. 

2 Learning in Belief Propagation 

2.1 Belief propagation 

The sum-product algorithm for belief propagation on a graphical model with energy function (|2]) 
uses the following equations J4): 

mi-y a (xi) oc Y\_ mp-n(Xi) m a ^i{xi) oc ^ e 0a ' < * , <* (x ° ,) ] j mj^. a (xj) (3) 

where Ni and N a are the neighbors of node i or factor a in the factor graph. Once these messages 
converge, the single-node and factor beliefs are given by 

bi(xi) oc JJ m a _ > i(x i ) b a (x a ) oc e "'*"^ J|m i ^ Q (a; i ) (4) 

a£Ni i£N a 

where the beliefs must each be normalized to one. For tree graphs, these beliefs exactly equal 
the marginals of the graphical model Qo(x). For loopy graphs, the beliefs at fixed points are 
often good approximations of the marginals. While they are guaranteed to be locally consistent, 
\x- b a {x a ) — bi(xi), they are not necessarily globally consistent: There may not exist a single 
joint distribution B(x) of which the beliefs are the marginals 0. This is why the resultant beliefs 
are called pseudomarginals, rather than simply marginals. We use a vector b to refer to the set of 
both node and factor beliefs produced by belief propagation. 

2.2 Bethe free energy 

Despite its limitations, BP is found empirically to work well in many circumstances. Some theoreti- 
cal justification for loopy belief propagation emerged with proofs that its stable fixed points are local 
minima of the Bethe free energy l|6]|2l- Free energies are important quantities in machine learning 
because the Kullback-Leibler divergence between the data and model distributions can be expressed 
in terms of free energies, so models can be optimized by minimizing free energies appropriately. 

Given an energy function E(x) from (|2jl, the Gibbs free energy of a distribution Q(x) is 

F[Q) = U[Q] - S[Q] (5) 

where U is the average energy of the distribution 

U[Q] = J2 E(x)Q{x) =-J2 e »'Yl <t> a (x a )q a {x a ) (6) 

x a x a 

which depends on the marginals q a (x a ) of Q(x), and S is the entropy 

STQ] = -£Q(s) log Q(s) (7) 
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Minimizing the Gibbs free energy F[Q] recovers the distribution Qo(x) for the graphical model Q. 
The Bethe free energy F@ is an approximation to the Gibbs free energy, 

F P [Q] =U[Q\- S [Q] (8) 

in which the average energy U is exact, but the true entropy S is replaced by an approximation, the 
Bethe entropy which is a sum over the factor and node entropies J6): 

S*[Q]=Y,SM+Y l O--di)Si[qi] (9) 

a i 

S a [q a ] = ~ q a (x a ) log q a (x a ) Si[qi] = - ^ q^Xj) log % fa) (10) 

The coefficients di = \Ni\ are the number of factors neighboring node i, and compensate for the 
overcounting of single-node marginals due to overlapping factor marginals. For tree-structured 
graphical models, which factorize as Q(x) = JT q a (x a ) ]X qi(xi) 1 ^ di , the Bethe entropy is exact, 
and hence so is the Bethe free energy. On loopy graphs, the Bethe entropy S 13 isn't really even an 
entropy (e.g. it may be negative) because it neglects all statistical dependencies other than those 
present in the factor marginals. Nonetheless, the Bethe free energy is often close enough to the 
Gibbs free energy that its minima approximate the true marginals 0. Since stable fixed points of 
BP are minima of the Bethe free energy JUG), this helped explain why belief propagation is often 
so successful. 

To emphasize that the Bethe free energy directly depends only on the marginals and not the joint 
distribution, we will write F 13 [q] where q is a vector of pseudomarginals q a (x a ) for all a and all x a . 
Pseudomarginal space is the convex set [5| of all q that satisfy the positivity and local consistency 
constraints, 

0<q a (x a )<l ^ 1a{x a ) = qi(xi) /Jgtfa) = 1 (11) 

X a \Xi Xi 



2.3 Pseudo-moment matching 

We now wish to correct for the deficiencies of belief propagation by identifying the parameters 9 
so that BP produces beliefs b matching the true marginals p of the target distribution P(x). Since 
the fixed points of BP are stationary points of F° one may simply try to find parameters 9 that 
produce a stationary point in pseudomarginal space at p, which is a necessary condition for BP to 
reach a fixed point there. Simply evaluate the gradient at p, set it to zero, and solve for 9. 

Note that in principle this gradient could be used to directly minimize the Bethe free energy, but 
is a complicated function of q that usually cannot be minimized analytically JSJ. In contrast, 
here we are using it to solve for the parameters needed to move beliefs to a target location. This is 
much easier, since the Bethe free energy is linear in 9. This approach to learning parameters has 
been described as 'pseudo-moment matching' ll9l ITOl [TP . 

The Lq-element vector q is an overcomplete representation of the pseudomarginals because it must 



obey the local consistency constraints (111. It is convenient to express the pseudomarginals in terms 
of a minimal set of parameters r] with the smaller dimensionality Lg as 9 and <$, using an affine 
transform 

q = Wrj + k (12) 

where W is an L q x Lg rectangular matrix. One example is the expectation parameters r\ a = 
J2 X qa(x a )(f> a (x a ) 0, giving the energy simply as U = 9 ■ rj. The gradient with respect to 
those minimal parameters is 

dF? _dU_ dS? dq Q dS fi w 
dr} drj dq drj dq 

The Bethe entropy gradient is simplest in the overcomplete representation q, 

8S 13 dS^ 
a — / — c = -l-logg a (x a ) » , r = (-1 - loggifa))(l - di) (14) 
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Setting the gradient ( 13 1 to zero, we have a simple linear equation for the parameters 9 that tilt the 
Bethe free energy surface (Figure [TJ\) enough to place a stationary point at the desired marginals p: 



dq 



W 



(15) 



2.4 Unbelievable marginals 

It is well known that BP may converge on fixed points that cannot be realized as marginals of any 
joint distribution. In this section we show that the converse is also true: There are some distributions 
whose marginals cannot be realized as beliefs for any set of couplings. In these cases, existing 
methods for learning often yield poor results, sometimes even worse than performing no learning 
at all. This is surprising in view of claims to the contrary: ||9] |5) state that belief propagation run 
after pseudo-moment matching can always reach a fixed point that reproduces the target marginals. 
While BP does technically have such fixed points, they are not always stable and thus may not be 
reachable by running belief propagation. 

Definition 1. A set of marginals are 'unbelievable' if belief propagation cannot converge to them 
for any set of parameters. 

For belief propagation to converge to the target — namely, the marginals p — a zero gradient is 
not sufficient: The Bethe free energy must also be a local minimum |7][j] This requires a positive- 
definite Hessian of F@ (the 'Bethe Hessian' H) in the subspace of pseudomarginals that satisfies the 
local consistency constraints. Since the energy U is linear in the pseudomarginals, the Hessian is 
given by the second derivative of the Bethe entropy, 

H= d - 7 f Y = -W^- 7 f Y W (16) 
where projection by W constrains the derivatives to the subspace spanned by the minimal parameters 



rj. If this Hessian is positive definite when evaluated at p then the parameters given by ( 15 1 give 
F 13 a minimum at the target p. If not, then the target cannot be a stable fixed point of loopy belief 
propagation. In Section[3] we calculate the Bethe Hessian explicitly for a binary model with pairwise 
interactions. 

Theorem 1. Unbelievable marginal probabilities exist. 



Proof. Proof by example. The simplest unbelievable example is a binary graphical model with 
pairwise interactions between four nodes, x E { — 1, +1} 4 , and the energy E(x) = —Jj2(ij) x iXj- 
By symmetry and jlj, marginals of this target P(x) are the same for all nodes and pairs: Pi(xi) = \ 
and Pij(x.i = Xj) = p = (2 + 4/(1 + e 2J — e 4J + e 6J )) _1 . Substituting these marginals into 
the appropriate Bethe Hessian (22i gives a matrix that has a negative eigenvalue for all p > |, or 
J > 0.316. The associated eigenvector u has the same symmetry as the marginals, with single- 
node components u t = |(-2 + 7p - 8p 2 + ^10 - 28p + 8lp 2 - 112p 3 + 64p 4 ) and pairwise 
components ity = 1. Thus the Bethe free energy does not have a minimum at the marginals of these 
P(x). Stable fixed points of BP occur only at local minima of the Bethe free energy [7], and so BP 
cannot reproduce the marginals p for any parameters. Hence these marginals are unbelievable. □ 



Not only do unbelievable marginals exist, but they are actually quite common, as we will see in 
Section [3] Graphical models with multinomial or gaussian variables and at least two loops always 
have some pseudomarginals for which the Hessian is not positive definite [12]. On the other hand, 
all marginals with sufficiently small correlations are believable because they are guaranteed to have 
a positive-definite Bethe Hessian [ 12 1. Stronger conditions have not yet been described. 



2.5 Bethe wake-sleep algorithm 

When pseudo-moment matching fails to reproduce unbelievable marginals, an alternative is to use a 
gradient descent procedure for learning, analagous to the wake-sleep algorithm used to train Boltz- 
mann machines [13 |. The original rule can be derived as gradient descent of the Kullback-Leibler 

'Even this is not sufficient, but it is necessary. 
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Figure 1: Landscape of Bethe free energy for the binary graphical model with pairwise interactions. 
(A) A slice through the Bethe free energy (solid lines) along one axis v\ of pseudomarginal space, 
for three different values of parameters 6. The energy U is linear in the pseudomarginals (dotted 
lines), so varying the parameters only changes the tilt of the free energy. This can add or remove 
local minima. (B) The second derivatives of the free energies in (A) are all identical. Where the 
second derivative is positive, a local minimum can exist (cyan); where it is negative (yellow), no 
parameters can produce a local minimum. (C) A two-dimensional slice of the Bethe free energy, 
colored according to the minimum eigenvalue A m i n of the Bethe Hessian. During a run of Bethe 
wake-sleep learning, the beliefs (blue dots) proceed along V2 toward the target marginals p. Stable 
fixed points of BP can exist only in the believable region (cyan), but the target p resides in an 
unbelievable region (yellow). As learning equilibrates, the fixed points jump between believable 
regions on either side of the unbelievable zone. 



divergence between the target P(x) and the graphical model Q(x) ([T|i, 

Dkl[P\\Q] - £ P(x) log = F[P] ~ F[Q] (17) 

where F is the Gibbs free energy p]) using the energy function Here we use a new cost func- 
tion, the 'Bethe divergence' D^pljb], by replacing these free energies by Bethe free energies lfl4ll 
evaluated at the true marginals p and at the beliefs b obtained from BP fixed points, 

D p \p\\b]=FP\p]-FP[b] (18) 
We use gradient descent to optimize this cost, with gradient 

dD p _ dD p dD fj db 



de de db oe 



(19) 



The data's free energy does not depend on the beliefs, so dF^[p]/ db = 0, and fixed points of 
belief propagation are stationary points of the Bethe free energy, so dF [b]/db = 0. Consequently 
dDp /db — 0. Furthermore, the entropy terms of the free energies do not depend explicitly on 0, so 

where r)(q) = J2 X l( x )4>( x ) are the expectations of the sufficient statistics 4>(x) under the pseudo- 
marginals q. This gradient forms the basis of a simple learning algorithm. At each step in learning, 
belief propagation is run, obtaining beliefs b for the current parameters 6. The parameters are then 
changed in the opposite direction of the gradient, 

AO = -e^ = e^ip) - rj(b)) (21) 

where e is a learning rate. This generally increases the Bethe free energy for the beliefs while 
decreasing that of the data, hopefully allowing BP to draw closer to the data marginals. We call this 
learning rule the Bethe wake-sleep algorithm. 

Within this algorithm, there is still the freedom of how to choose initial messages for BP at each 
learning iteration. The result depends on these initial conditions because BP can have several stable 
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fixed points. One might re-initialize the messages to a fixed starting point for each run of BP, choose 
random initial messages for each run, or restart the messages where they stopped on the previous 
learning step. In our experiments we use the first approach, initializing to constant messages at the 
beginning of each BP run. 

The Bethe wake-sleep learning rule sometimes places a minimum of F@ at the true data distribution, 
such that belief propagation can give the true marginals as one of its (possibly multiple) fixed points. 
However, for the reasons provided above, this cannot occur where the Bethe Hessian is not positive 
definite. 



2.6 Ensemble belief propagation 

When the Bethe wake-sleep algorithm attempts to learn unbelievable marginals, the parameters 
and beliefs do not reach a fixed point but instead continue to vary over time (Figure |2j\,B). Still, 
if learning reaches equilibrium, then the temporal average of beliefs is equal to the unbelievable 
marginals. 

Theorem 2. If the Bethe wake-sleep algorithm reaches equilibrium, then unbelievable marginals 
are matched by the belief propagation fixed points averaged over the equilibrium ensemble of pa- 
rameters. 



Proof. At equilibrium, the time average of the parameter changes is zero by definition, (A0) t = 0. 
Substitution of the Bethe wake-sleep equation, A9 = e(r](p) — rj(b(t))) (20 1, directly implies 
that (r](b(t))) t = rj(p). The deterministic mapping (12i from the minimal representation to the 
pseudomarginals gives (b(t)) t = p. □ 



After learning has equilibrated, fixed points of belief propagation occur with just the right frequency 
so that they can be averaged together to reproduce the target distribution exactly (Figure |2p). Note 
that none of the individual fixed points may be close to the true marginals. We call this inference 
algorithm ensemble belief propagation (eBP). 

Ensemble BP produces perfect marginals by exploiting a constant, small amplitude learning, and 
thus assumes that the correct marginals are perpetually available. Yet it also works well when 
learning is turned off, if parameters are drawn randomly from a gaussian distribution with mean 
and covariance matched to the equilibrium distribution, 9 ~ Af(0, In the simulations below 
(Figures |2p-D, [3|3-C), Eg was always low-rank, and only one or two principle components were 
needed for good performance. The gaussian ensemble is not quite as accurate as continued learning 
(Figure |3ji,C), but the performance is still markedly better than any of the available fixed points. 

If the target is not within a convex hull of believable pseudomarginals, then learning cannot reach 
equilibrium: Eventually BP gets as close as it can but there remains a consistent difference rj(p) — 
r)(b), so 6 must increase without bound. Though possible in principle, we did not observe this effect 
in any of our experiments. There may also be no equilibrium if belief propagation at each learning 
iteration fails to converge. 



3 Experiments 

The experiments in this section concentrate on the Ising model: N binary variables, s E {— 1,-1-1} , 
with factors comprising individual variables xi and pairs Xi, Xj. The energy function is E(x) = 
— J2i hiXi — JijXiXj. Then the sufficient statistics are the various first and second moments, 

Xi and XiXj, and the natural parameters are h i7 Jij. We use this model both for the target distribu- 
tions and the model. 

We parameterize pseudomarginals as g+ + } where g 4 + = qi(xi = +l)andg+ + = qtj(xi — Xj 
■ I J§ |. The remaining probabilities are linear functions of these values. Positivity constraints and 
local consistency constraints then appear as < qf < 1 and max(0, qf +9^ — 1) < < 
min(g i f , q^). If all the interactions are finite, then the inequality constraints are not active fl31 . In 
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Figure 2: Averaging over variable couplings can produce marginals otherwise unreachable by be- 
lief propagation. (A) As learning proceeds, the Bethe wake-sleep algorithm causes parameters to 
converge on a discrete limit cycle when attempting to learn unbelievable marginals. (B) The same 
limit cycle, projected onto their first two principal components u% and 1*2 of during the cycle. 
(C) The corresponding beliefs b during the limit cycle (blue circles), projected onto the first two 
principal components Vi and v 2 of the trajectory through pseudomarginal space. Believable regions 
of pseudomarginal space are colored with cyan and the unbelievable regions with yellow, and incon- 
sistent pseudomarginals are black. Over the limit cycle,_the average beliefs b (blue x ) are precisely 
equal to the target marginals p (black □). The average b (red +) over many fixed points of BP (red 
dots) generated from randomly perturbed parameters + 80 still produces a better approximation 
of the target marginals than any of the individual believable fixed points. (D) Even the best amongst 
several BP fixed points cannot match unbelievable marginals (black and grey). Ensemble BP leads 
to much improved performance (red and pink). 



this parameterization, the elements of the Bethe Hessian ( 16 1 are 



dq+dqj 



d 2 S? 



++ 



Ml " di) [(qt)- 1 + (1 - qtr 1 ] + S jeNi [(1 - qt - q+ + t^)" 1 ] (22a) 

+ E M - «&r 1 + a - it - 4 + ittr 1 ) 

k£Ni 

- kj [(it - ittr 1 + a - it - qt + <4 + r x ] (22b) 

- kk - it/r 1 + (1 - it ii + it^r 1 ] 

kM H+)- 1 + (it - i^r 1 + («; - itfT 1 + a - it - it + it^r 1 } 

(22c) 



Figure [3}\ shows the fraction of marginals that are unbelievable for 8-node, fully-connected Ising 
models with random coupling parameters hi ~ A/"(0, |) and ~ A/"(0, <tj). For ctj > j, most 
marginals cannot be reproduced by belief propagation with any parameters, because the Bethe Hes- 
sian ( |22| has a negative eigenvalue. 

We generated 500 Ising model targets using oj = |, selected the unbelievable ones, and eval- 
uated the performance of BP and ensemble BP for various methods of choosing parameters 0. 
Each run of BP used exponential temporal message damping of 5 time steps |16|, m t+1 = 
am 1 + (1 — a)m un dampcd with a = e -1 / 5 . Fixed points were declared when messages changed 
by less than 10~ 9 on a single time step. We evaluated BP performance for the actual parameters 
that generated the target ([TJ, pseudomoment matching ( 15 1, and at best-matching beliefs obtained at 



any time during Bethe wake-sleep learning. We also measured eBP performance for two parameter 
ensembles: the last 100 iterations of Bethe wake-sleep learning, and parameters sampled from a 
gaussian Af(0, £0) with the same mean and covariance as that ensemble. 

Belief propagation gave a poor approximation of the target marginals, as expected for a model 
with many strong loops. Even with learning, BP could never get the correct marginals, which was 
guaranteed by selection of unbelievable targets. Yet ensemble belief propagation gave excellent 
results. Using the exact parameter ensemble gave orders of magnitude improvement, limited by the 
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Figure 3: Performance in learning unbelievable marginals. (A) Fraction of marginals that are unbe- 
lievable. Marginals were generated from fully connected, 8-node binary models with random biases 
and pairwise couplings, hi ~ A/"(0, |) and Jy ~ -^dp) <T j)- (B,C) Performance of five models on 
370 unbelievable random target marginals (Section HI, measured with Bethe divergence £)^[p||b] 
(B) and Euclidean distance \p — b\ (C). Target were generated as in (A) with oj = ~, and selected 
for unbelievability. Bars represent central quartiles, and white line indicates the median. The five 
models are: (i) BP on the graphical model that generated the target distribution, (if) BP after pa- 
rameters are set by pseudomoment matching, (iii) the beliefs with the best performance encountered 
during Bethe wake-sleep learning, (iv) eBP using exact parameters from the last 100 iterations of 
learning, and (v) eBP with gaussian-distributed parameters with the same first- and second-order 
statistics as iv. 



number of beliefs being averaged. The gaussian parameter ensemble also did much better than even 
the best results of BP. 



4 Discussion 



Other studies have also made use of the Bethe Hessian to draw conclusions about belief propagation. 
For instance, the Hessian reveals that the Ising model's paramagnetic state becomes unstable in 
BP for large enough couplings ifTTl . For another example, when the Hessian is positive definite 
throughout pseudomarginal space, then the Bethe free energy is convex and thus BP has a unique 
fixed point [18|. Yet the stronger interpretation appears to be underappreciated: When the Hessian 
is not positive definite for some pseudomarginals, then BP can never have a fixed point there, for 
any parameters. 

One might hope that by adjusting the parameters of belief propagation in some systematic way, 
6 — > 6bp, one could fix the approximation and so perform exact inference. In this paper we 
proved that this is a futile hope, because belief propagation simply can never converge to certain 
marginals. However, we also provided an algorithm that does work: Ensemble belief propagation 
uses BP on several different parameters with different fixed points and averages the results. This 
approach preserves the locality and scalability which make BP so popular, but corrects for some of 
its defects at the cost of running the algorithm a few times. Additionally, it raises the possibility that 
a systematic compensation for the flaws of BP might exist, but only as a mapping from individual 
parameters to an ensemble of parameters 9 —> {O c bp} that could be used in eBP. 

An especially clear application of eBP is to discriminative models like Conditional Random Fields 
|[T9l . These models are trained so that known inputs produce known inferences, and then generalize 
to draw novel inferences from novel inputs. When belief propagation is used during learning, then 
the model will fail even on known training examples if they happen to be unbelievable. Overall 
performance will suffer. Ensemble BP can remedy those training failures and thus allow better 
performance and more reliable generalization. 

This paper addressed learning in fully-observed models only, where marginals for all variables were 
available during training. Yet unbelievable marginals exist for models with hidden variables as well. 
Ensemble BP should work as in the fully-observed case, but training will require inference over the 
hidden variables during both wake and sleep phases. 
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One important inference engine is the brain. When inference is hard, neural computations may resort 
to approximations, perhaps including belief propagation ll20ll2Tl f22 23 , 24 1. It would be undesirable 
for neural circuits to have big blind spots, i.e. reasonable inferences it cannot draw, yet that is 
precisely what occurs in BP. By averaging over models with eBP, this blind spot can be eliminated. In 
the brain, synaptic weights fluctuate due to a variety of mechanisms. Perhaps such fluctuations allow 
averaging over models and thereby reach conclusions unattainable by a deterministic mechanism. 
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